reinforcement learning approach
TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest-scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy. TorsionNet also outperforms the far more exhaustive but computationally intensive Self-Guided Molecular Dynamics sampling method.
Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models
Singh, Mukul, Singha, Ananya, Parab, Aishni, Mehrotra, Pronita, Gulwani, Sumit
Associative thinking--the ability to connect seemingly unrelated ideas--is a foundational element of human creativity and problem-solving. This paper explores whether reinforcement learning (RL) guided by associative thinking principles can enhance a model's performance across diverse generative tasks, including story writing, code generation, and chart creation. We introduce a reinforcement learning framework that uses a prompt-based evaluation mechanism, incorporating established divergent thinking metrics from creativity research. A base language model is fine-tuned using this framework to reward outputs demonstrating higher novelty through higher degrees of conceptual connectivity. Interestingly, the experimental results suggest that RL-based associative thinking-trained models not only generate more original and coherent stories but also exhibit improved abstraction and flexibility in tasks such as programming and data visualization. Our findings provide initial evidence that modeling cognitive creativity principles through reinforcement learning can yield more adaptive and generative AI.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support
Zhang, Eric Hua Qing, Ive, Julia
Mental health illness represents a substantial global socioeconomic burden, with COVID - 19 further exacerbating accessibility challenges and driving increased demand for telehealth mental health support. While large language models ( L LMs) offer promising solutions through 24/7 availability and non - judgmental interactions, pre - trained models often lack the contextual and emotional awareness necessary for appropriate therapeutic responses. This paper investigated the application of supervised fine - tu ning (SFT) and reinforcement learning (RL) techniques to enhance GPT - 2's capacity for therapeutic dialogue generation. The methodology restructured input formats to enable simultaneous processing of contextual information and emotional states alongside user input, employing a multi - component reward function that aligned model outputs with professional therapist responses and annotated emotions. Results demonstrated improvements through reinforcement learning over baseline GPT - 2 across multiple evaluation me trics: BLEU (0.0111), ROUGE - 1 (0.1397), ROUGE - 2 (0.0213), ROUGE - L (0.1317), and METEOR (0.0581). LLM evaluation confirmed high contextual relevance and professionalism, while reinforcement learning achieved 99.34% emotion accuracy compared to 66.96% for baseline GPT - 2. These findings demonstrate reinforcement learning's effectiveness in developing therap eutic dialogue systems that can serve as valuable assistive tools for therapists while maintaining essential human clinical oversight. The code and a ppendic es are publicly available at: https://github.com/ez
- North America > United States (1.00)
- Europe (1.00)
- Research Report > Promising Solution (0.66)
- Research Report > New Finding (0.48)
- Health & Medicine > Health Care Technology > Telehealth (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
RL as Regressor: A Reinforcement Learning Approach for Function Approximation
Standard regression techniques, while powerful, are often constrained by predefined, differentiable loss functions such as mean squared error. These functions may not fully capture the desired behavior of a system, especially when dealing with asymmetric costs or complex, non-differentiable objectives. In this paper, we explore an alternative paradigm: framing regression as a Reinforcement Learning (RL) problem. We demonstrate this by treating a model's prediction as an action and defining a custom reward signal based on the prediction error, and we can leverage powerful RL algorithms to perform function approximation. Through a progressive case study of learning a noisy sine wave, we illustrate the development of an Actor-Critic agent, iteratively enhancing it with Prioritized Experience Replay, increased network capacity, and positional encoding to enable a capable RL agent for this regression task. Our results show that the RL framework not only successfully solves the regression problem but also offers enhanced flexibility in defining objectives and guiding the learning process.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
SalesRLAgent: A Reinforcement Learning Approach for Real-Time Sales Conversion Prediction and Optimization
Current approaches to sales conversation analysis and conversion prediction typically rely on Large Language Models (LLMs) combined with basic retrieval augmented generation (RAG). These systems, while capable of answering questions, fail to accurately predict conversion probability or provide strategic guidance in real time. In this paper, we present SalesRLAgent, a novel framework leveraging specialized reinforcement learning to predict conversion probability throughout sales conversations. Unlike systems from Kapa.ai, Mendable, Inkeep, and others that primarily use off-the-shelf LLMs for content generation, our approach treats conversion prediction as a sequential decision problem, training on synthetic data generated using GPT-4O to develop a specialized probability estimation model. Our system incorporates Azure OpenAI embeddings (3072 dimensions), turn-by-turn state tracking, and meta-learning capabilities to understand its own knowledge boundaries. Evaluations demonstrate that SalesRLAgent achieves 96.7% accuracy in conversion prediction, outperforming LLM-only approaches by 34.7% while offering significantly faster inference (85ms vs 3450ms for GPT-4). Furthermore, integration with existing sales platforms shows a 43.2% increase in conversion rates when representatives utilize our system's real-time guidance. SalesRLAgent represents a fundamental shift from content generation to strategic sales intelligence, providing moment-by-moment conversion probability estimation with actionable insights for sales professionals.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Review for NeurIPS paper: TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
Weaknesses: Because the idea is new and very interesting, a number of topics can up that could/should be addressed. Is there a way to be certain that the gradient descent using MMFF has the molecule stay on the same basin of the PES that the rigid rotor sampled? It is likely, particularly in crowded conformations that the structure and energy that MMFF reports are not for the same internal angles as the initial torsion angles would suggest. The Gibbs Score is introduced as some completely new idea, but it's essentially related to a (relative) population according to Maxwell Boltzmann statistics. Furthermore, the log of Gibbs score is then a relative free energy, a very intuitive connection with the underlying physics.
Review for NeurIPS paper: TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
The reviewers found this paper to be interesting and compelling, nicely summarized by R2 in discussion: think the method is sound and exciting and the key challenges in transferability live in the availability of (high-accuracy) training data and in the challenges of representation learning for molecules (GCNs need to be exposed to a lot of chemical variability to be able to interpolate in chemical space.). The alkanes are essentially the same bond over and over and lignin is trained and tested in the same chemical space. I insist that these are representation learning challenges to be solved by the community and improvements there could be combined with this RL approach." That said, the reviewers did find several areas where the paper can be improved. Because of space limitations, I understand that not all of these suggestions will be able to be incorporated within page limits, but I do expect the authors will address as much as possible within the main final text, and all feedback addressed either in main text or in a supplementary appendix.
Enhancing Disaster Resilience with UAV-Assisted Edge Computing: A Reinforcement Learning Approach to Managing Heterogeneous Edge Devices
Azfar, Talha, Huang, Kaicong, Ke, Ruimin
Edge sensing and computing is rapidly becoming part of intelligent infrastructure architecture leading to operational reliance on such systems in disaster or emergency situations. In such scenarios there is a high chance of power supply failure due to power grid issues, and communication system issues due to base stations losing power or being damaged by the elements, e.g., flooding, wildfires etc. Mobile edge computing in the form of unmanned aerial vehicles (UAVs) has been proposed to provide computation offloading from these devices to conserve their battery, while the use of UAVs as relay network nodes has also been investigated previously. This paper considers the use of UAVs with further constraints on power and connectivity to prolong the life of the network while also ensuring that the data is received from the edge nodes in a timely manner. Reinforcement learning is used to investigate numerous scenarios of various levels of power and communication failure. This approach is able to identify the device most likely to fail in a given scenario, thus providing priority guidance for maintenance personnel. The evacuations of a rural town and urban downtown area are also simulated to demonstrate the effectiveness of the approach at extending the life of the most critical edge devices.
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- Information Technology (1.00)
- Energy > Energy Storage (0.46)
- Energy > Power Industry (0.34)
TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score.
Optimizing Low-Speed Autonomous Driving: A Reinforcement Learning Approach to Route Stability and Maximum Speed
Li, Benny Bao-Sheng, Wu, Elena, Yang, Hins Shao-Xuan, Liang, Nicky Yao-Jin
Autonomous driving has garnered significant attention Reinforcement Learning (RL) has become a powerful in recent years, especially in optimizing vehicle approach for addressing complex decision-making performance under varying conditions. This paper challenges in autonomous systems, particularly in addresses the challenge of maintaining maximum low-speed scenarios. Unlike high-speed driving, lowspeed speed stability in low-speed autonomous driving environments demand high precision, safety, while following a predefined route. Leveraging and stability [7] due to dynamic obstacles and confined reinforcement learning (RL), we propose a novel approach spaces. This paper explores several applications to optimize driving policies that enable the of RL in low-speed contexts, demonstrating its potential vehicle to achieve near-maximum speed without compromising to enhance performance in various tasks.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (1.00)
- Overview > Innovation (0.34)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.83)